Development and validation of a nomogram to predict long-term cancer-specific survival for patients with osteosarcoma

The present work aimed to establish a new model to accurately estimate overall survival (OS) as well as cancer-specific survival (CSS) of osteosarcoma. Osteosarcoma cases were collected from the Surveillance, Epidemiology, and End Results (SEER) database between 2004 and 2017 and randomized as training or validation sets. Then, the OS- and CSS-related variables were discovered through multivariate Cox regression analysis to develop new nomograms to predict the 1-, 3- and 5-year OS and CSS. Besides, consistency index (C-index), decision curve analysis (DCA), along with calibration curve were adopted for assessing the predicting ability of our constructed nomograms after calibrating for 1-, 3- and 5-year OS and CSS. Altogether, 1727 osteosarcoma cases were enrolled in the present study and randomly divided as training (n = 1149, 70%) or validation (n = 576, 30%) set. As shown by univariate as well as multivariate Cox regression analyses, age, grade, T stage, M stage, surgery, chemotherapy, and histological type were identified to be the adverse factors to independently predict OS and CSS among the osteosarcoma cases. Besides, based on results of multivariate Cox regression analysis, we constructed the OS and CSS prediction nomograms. The C-index in training set was 0.806 (95% CI 0.769–0.836) for OS nomogram and 0.807 (95% CI 0.769–0.836) for CSS nomogram. In the meantime, C-index value in validation set was 0.818 (95% CI 0.789–0.847) for OS nomogram, while 0.804 (95% CI 0.773–0.835) for CSS nomogram. Besides, those calibration curves regarding the 3- and 5-year CSS of our constructed nomogram were highly consistent between the predicted values and the measurements in the training set as well as the external validation set. Our constructed nomogram outperformed the TNM stage in prediction. Our constructed nomogram is facile, creditable, and feasible; it efficiently predicts OS and CSS for osteosarcoma cases and can assist clinicians in assessing the prognosis for individuals and making decisions.

www.nature.com/scientificreports/ classification system alone, which may not comprehensively assess the clinicopathological variables, including sex, age, race, or additional factors and is commonly employed to predict prognosis of osteosarcoma in extremities. It might not suitable to be used in axial location. Nomogram represents the statistical approach to determine the clinical event probabilities through taking into account those pre-weight values of all factors 9,10 . Recently, nomogram is extensively utilized for predicting diverse cancer survival [11][12][13][14][15] . In recent years, the massive data of cancer patients based on open-accessed data and bioinformatics methods make it possible for us to explore the independent risk factors for cancer prognosis [16][17][18] . The publically accessible SEER database includes cancer patient data across 18 registered sites that cover about almost 28% USA population 19,20 . This work was conducted to construct a creditable nomogram for predicting overall survival (OS) as well as cancer-specific survival (CSS) of osteosarcoma cases 21 ; thus, assisting clinicians in providing better-customized treatment options to reduce the rate of metastasis and improve the survival rate of patients.

Materials and methods
Patients selection. The osteosarcoma cases in this work were selected from the SEER database, and their corresponding anonymous clinical data were extracted accordingly. The SEER*Stat software designed by the National Cancer Institute (version 8.3.6, https:// seer. cancer. gov/ seers tat/) was utilized, which covered the SEER 18 Regs custom data containing more therapeutic fields and the Nov 2018 Sub database (covering 2004-2016 data). In this work, osteosarcoma cases conforming to the following inclusion criteria were selected: (1) Those with the diagnosis of osteosarcoma as the primary malignant tumor from 1983 to 2014 based on the International Classification of Diseases for Oncology [ICD-O] 9180-9187, 9192-9194 22 ; (2) Those whose osteosarcomas were confirmed histologically; (3) Those with osteosarcoma in extremities (long/short bones in the four extremities) or at the axial location (skull, ribs, spine, and pelvis); (4) Those whose histological type was determined; (5) Those with estimated survival time or identified cause of mortality after they were diagnosed.
Exclusion criteria. Patients whose survival time was unavailable or unclear were excluded from this study.
The patient clinicopathological characteristics, such as age, gender, race, grade, histological type, tumor site, size, surgery, stage of surgery, chemotherapy, radiotherapy, and survival time, were harvested. As for age, the cases were divided as 0-24, 25-59 and > 59 years groups. The races were classified as black, white, or other (Alaskan Native/American Indian, Pacific/Asian Islander). The tumor sites were classified into an extremity (long/short bones in four extremities) or the axial location (skull, ribs, spine, and pelvis). Tumors were classified into three different sizes (≤ 89 mm, small; 89-139 mm, intermediate; and > 140 mm, large). Also, the low-grade tumors covered the moderately-and well-differentiated grades (namely, ICD-O-3 Grades I and II), whereas the highgrade tumors mainly included the poorly-differentiated and undifferentiated grades (namely, ICD-O-3 Grades III and IV). Meanwhile, surgery, chemotherapy, or radiotherapy were classified into yes or no.

Statistical analyses.
The specific processes of prediction model building and nomogram construction were as follow: Firstly, all patients were randomized as training or validation set in the ratio of 7:3. The development of the nomogram was performed using the training cohort, whereas the validation cohort was responsible for the validation. Secondly, the hazard ratios (HRs) and the 95% confidence intervals (CIs) were determined using univariate as well as multivariate Cox regression models and were used for assessing the contribution of every variable to OS or CSS independently. Thirdly, we introduced the significant variables identified from univariate analysis in the multivariate analysis to develop the nomograms for predicting 1-, 3-and 5-year OS and CSS. Additionally, another nomogram was also constructed on the basis of the TNM stage. Thereafter, the MedCalc software, version 15.2.0(MedCalc Software, Mariakerke, Belgium) was utilized for generating receiver operating characteristic (ROC) curves of these two nomograms, and the respective areas under the curve (AUC) values were determined. Moreover, C-index, as well as the calibration curve (1000 bootstraps resamples), were employed to evaluate the nomogram performance. Generally, C-index is between 0.5 and 1.0, with 0.5 indicating random chance, whereas 1.0 suggesting perfect discrimination. Also, net benefits of our constructed nomograms were evaluated by using decision curve analysis (DCA).

Results
Demographic and pathologic characteristics. Overall, 1725 cases were included into this work and randomized as training (n = 1149) or validation (n = 576) set. Figure 1 presents the patient inclusion process. Table 1 displays the demographic and pathological data of osteosarcoma cases. According to our results, patients aged 0-24 years (64.7%) showed the greatest morbidity of osteosarcoma, and most of the osteosarcoma cases were from white races (74.0%) and males (53.9%). As for the tumor site, osteosarcoma was most commonly located at the lower extremity (58.87%), followed by the primary axial location (28.81%). Besides, most patients were categorized into M0 stage (78.3%) and grade IV (51.4%). Besides, 86.6% of cases underwent surgical resection, while 79.8% underwent chemotherapy. Difference between the two data sets was not significant.  (Table 4 and Fig. 3A,B). For comparing the consistency between the estimated and real survival, we utilized C-index for verifying our constructed nomograms in the training set. Resultantly, C-indices of the constructed OS and CSS prediction nomograms (OS, C-index = 0.806; CSS, C-index = 0.807) increased in comparison to those in the TNM stage (OS, C-index = 0.686; CSS, C-index = 0.735). Table 4 shows a consistent trend detected in the validation set. This similarity in results in both study sets suggests the accuracy of the model based on our nomograms. Also, we calibrated the 3-and 5-year OS and CSS prediction nomograms in both sets (Fig. 4), which approached the optimal curve, displaying that values predicted by our nomograms were highly consistent with the real measurements in both sets.
Clinical applications. Besides, net benefits were also calculated by DCA for evaluating the clinical effectiveness of our constructed nomogram. Resultantly, the constructed nomograms showed increased clinical net benefits compared with those in the TNM stage in the wide range of OS (10-50%) (Fig. 5A,B).

Discussion
Several prognosis-related factors may have a certain influence on the survival of osteosarcoma; however, the prognosis-related factors are not integrated previously. One individual prognosis index can lead to limited accuracy in predicting the prognosis for a patient. Nomogram has been the frequently used statistical approach to achieve high robustness and precision in predicting the probability of a patient's overall survival. Kim et al. 23 . established a nomogram for predicting the metastasis risk among the nonmetastatic osteosarcoma cases that was validated to outperform the tumor necrosis rate or traditional AJCC classification system alone in predicting a distant metastasis (DM) of the tumor. Xia et al. 24 established a nomogram for better estimating the prognosis for osteosarcoma cases that received surgery. Findings in the above studies could not be validated, so they had limited applicability in other populations because of the possible bias.
Kim and colleagues established a nomogram for predicting metastasis in patients with extremity osteosarcoma at Enneking stage IIB based on medical records from 91 cases that underwent surgery. Nonetheless, their study had a small sample size, so larger populations are required to validate the generalizability of their nomogram. In this study, the integrated and facile prognosis nomograms were constructed based on data collected from 1725 SEER-derived osteosarcoma patients so as to determine the OS and CSS at 3 and 5 years. Furthermore, our nomogram showed C-indices of 0.808 and 0.806, which were higher than most of the other nomograms for osteosarcoma.
Variables incorporated into the constructed nomogram were classified into 2 types of factors: (1) Clinical factors (such as age, race, grade, tumor size, tumor site, histological subtype, TNM stage), (2) Treatment-related factors (including surgery, chemotherapy, and/or radiotherapy). In the present work, most of our cases were under 24 years of age and occupied 56.7% and 56.8% in training and validation sets, respectively.
As depicted in Table 1, characteristics shown by many patients are: white, males, receiving surgery, had a tumor located at the extremities, and received chemotherapy. These findings corroborate those in previous studies. For the accurate selection of prognostic factors, univariate as well as multivariate Cox analyses were conducted for identifying factors to independently predict OS and CSS. According to our findings, age, tumor size, tumor site, TNM stage and grade, showed negative correlations with OS and CSS among the osteosarcoma cases, showing conformity with earlier findings [25][26][27][28] . M stage represents a factor that independently affects osteosarcoma patient prognosis. It is well known that distant metastasis of malignant tumors indicates a poor prognosis, which has been unanimously recognized by scholars 23 . However, due to the small number of baseline and outcome variables of N1 in the N staging, this variable could not become an independent prognostic factor influencing postoperative outcomes of patients with quadruped osteosarcoma when the multivariate COX regression model was fitted. www.nature.com/scientificreports/ The TNM classification system has been commonly used to estimate the survival of osteosarcoma, but it just predicts the restricted osteosarcoma risk. In the present work, the practical nomograms were successfully established using 13 variables, including age, sex, race, grade, tumor size, tumor site, histological type, TNM stage, surgery, radiotherapy, and chemotherapy. As revealed by our results, our constructed nomograms outperformed the traditional TNM classification system in predicting patient survival (C-index: 0.806 vs. 0.686, 0.807 vs. 0.735), suggesting poor evaluative prognosis of the single TNM classification system.
Remarkably, our nomograms suggested that appropriate treatment must be given to extending patient survival. Amputation was adopted as the major treatment for high-grade osteosarcoma prior to the 1970s since adjuvant chemotherapy had not emerged at that time; this had severely affected the patient life quality and survival probability. Thanks to the emergence of adjuvant chemotherapy as well as limb salvage surgery, patient survival has risen by 50% to about 70%. In the current study, for an 18-year-old black patient who had high-grade osteosarcoma at T3N1M0 stage in the extremity (tumor size, 10.0 cm), receiving surgery and chemoradiotherapy could improve OS and CSS of the patient from 20 to 72%.
Compared to previous articles on osteosarcoma based on the SEER database, we have established a 1-year, 3-year, and 5-year overall survival (OS) and cause-specific survival (CSS) nomogram using the SEER database. By scoring various risk factors (age, race, gender, metastasis, pathological stage, surgery, chemotherapy), we comprehensively assess the survival outcomes of patients at 1 year, 3 years, and 5 years. We have also evaluated www.nature.com/scientificreports/ the clinical utility of our approach through net benefits analysis. However, this study suffered from some limitations. Firstly, a certain bias might have crept in due to its retrospective nature. Secondly, our nomograms were constructed on the basis of the large sample size, and the model was validated internally, but external validation was lacking. It was difficult to design an external validation study since osteosarcoma is not commonly seen. Yet, our constructed nomograms were able to effectively and precisely predict the survival for individual osteosarcoma cases.

Conclusions
In this work, our constructed nomograms displayed great predicting ability. Findings in this study suggested that age, tumor site, tumor size, tumor grade, surgery, and TNM stage are factors that can independently estimate OS and CSS of osteosarcoma cases. Such factors are incorporated to establish nomograms to predict the survival of osteosarcoma cases. Our study presents a reliable and accurate method for predicting the survival of osteosarcoma patients. Additionally, the nomograms we developed can be effectively utilized to forecast the 1-, 3-, and 5-year overall survival (OS) and cancer-specific survival (CSS) rates for individual osteosarcoma cases. This valuable tool assists surgeons and clinicians in evaluating the likelihood of survival and determining the risk of mortality for each patient.

Data availability
The datasets used and/or analysed during the current study available from the corresponding author on reasonable request.